Cluster-Based Pattern Recognition in Natural Language Text
نویسندگان
چکیده
ii Acknowledgements I would like to thank my adviser Prof. Tishby for his guidance and assistance in producing this work, and for his suggestions and positive input. I would also like to thank Beata Beigman Klebanov for her constant help and advice throughout this work, including (but definitely not limited to) the contribution of the parsed data used here. Also deserving of thanks are my family, for their support, and especially my grandmother, Rose Brody, for her confidence in my achievements. iii Abstract This work presents the Clustered Clause structure, which uses information-based clustering and dependencies between sentence components to provide a simplified and generalized model of a grammatical clause. We show that this representation, which is based on dependencies within the sentence, enables us to detect complex textual relations at a higher level of context. The relations we detect are of interest in themselves, as linguistic phenomena, and are also highly suited for use in certain linguistic and cognitive tasks. We define and search for several types of patterns, moving from basic patterns to more complex ones, from patterns within the sentence to those involving entire sentences. Examples of recognized patterns of each type are presented, and also descriptions of several interesting phenomena detected by our method. We assess the quality of the results, and demonstrate the importance of the clustering and dependency model we chose. The principles behind our method are largely domain-independent, and can therefore be applied to other forms of structured sequential data as well.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملKernel Fuzzy C-Means Clustering for Word Sense Disambiguation in
Word sense disambiguation (WSD) in biomedical texts is important. The majority of existing research primarily focuses on supervised learning methods and knowledge-based approaches. Implementing these methods requires significant human-annotated corpus, which is not easily obtained. In this paper, we developed an unsupervised system for WSD in biomedical texts. First, we predefine the number of ...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملPati'ern Recognition Applied to the Acquisition of a Grammatical Classification System from Unrestricted English Text
Within computational linguistics, the use of statistical pattern matching is generally restricted to speech processing. We have attempted to apply statistical techniques to discover a grammatical classification system from a Corpus of 'raw' English text. A discovery procedure is simpler for a simpler language model; we assume a first-order Markov model, which (surprisingly) is shown elsewhere t...
متن کاملPattern Recognition Applied To The Acquisition Of A Grammatical Classification System From Unrestricted English Text
Within computational linguistics, the use of statistical pattern matching is generally restricted to speech processing. We have attempted to apply statistical techniques to discover a grammatical classification system from a Corpus of 'raw' English text. A discovery procedure is simpler for a simpler language model; we assume a first-order Markov model, which (surprisingly) is shown elsewhere t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005